
# LLM2Graph: Dynamic Knowledge Graph Construction and Evaluation Framework

## Overview
This package implements the core methodology described in the under review COLM 2025 paper:

> *The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning*

It enables:
1. Non-deterministic knowledge graph construction through LLM-driven knowledge elicitation.
2. Multi-hop, alias-aware, and compositional query generation for stress-testing unlearning methods.
3. Automated creation of entity-specific, reasoning-diverse evaluation datasets.

## Key Contributions (Paper-Aligned)
- Dynamically extract model-internal knowledge (pre-unlearning).
- Build structured entity-specific knowledge graphs via controlled BFS.
- Generate single-hop, multi-hop, and alias-based queries.
- Enable fine-grained control over reasoning complexity and surface perturbations.
- Automatically benchmark retention, unintended forgetting, and adversarial resilience.

---

## Core Features

### 1. **Entity to Graph**
- Non-deterministic Knowledge Elicitation using LLMs (GPT-4o, Gemini, Local).
- Graph Expansion via BFS with:
   - Adjustable max-depth
   - Decay factor controlling node expansion
   - API and token budget constraints
- Alias and Relationship Resolution via LLM and Fuzzy Matching.

### 2. **Text to Graph**
- Triplet Extraction: (Subject, Relation, Object) tuples from free-form text using LLM.
- Flexible Expansion Control: Selectively expand specific entities.

### 3. **Ground Truth Validation (MUSE)**
- Extract triples from ground-truth documents.
- Compare elicited knowledge to external gold-standard corpora.
- Compute Precision, Recall, F1-Score.

### 4. **Dynamic Query Generation**
- Automatically generates evaluation probes:
   - Single-Hop Queries
   - Multi-Hop Queries
   - Alias-perturbed Queries
- Fine-grained control over query difficulty.

---

## Paper Claims Operationalized
| Feature                                     | Supported  |
|---------------------------------------------|------------|
| BFS Graph Expansion with Decay              | ✅         |
| LLM-guided Triple Extraction                | ✅         |
| Alias Resolution                            | ✅         |
| Relevance Filtering                         | ✅         |
| Multi-hop Query Generation                  | ✅         |
| Retention vs Forget Evaluation              | ✅         |
| Automatic Benchmark Generation              | ✅         |
| No Manual Data Curation Needed              | ✅         |

---

## Main Components

### Graph Construction
```python
await build_graph_bfs_async_with_decay(initial_query="Stephen King", max_depth=3, batch_size=10, decay_factor=0.8)
```

### Ground Truth Validation
```python
ground_truth_triples = extract_triples_from_ground_truth(ground_truth_text)
precision, recall, f1 = compute_evaluation_metrics(graph, ground_truth_triples)
```

### Multi-hop & Alias Query Generation
```python
g2t = Graph2TextWithTemplates(path_to_config)
g2t.generate_dataset()
```

---

## 🛠️ Installation
```bash
bash setup.sh
source venv/bin/activate
```

or

```bash
make install
```

---



## Evaluation Pipeline
1. Build Graph:
```bash
python experiments/build_graph_for_entity.py
```

2. Generate QA Dataset:
```bash
python experiments/generate_templated_qa_dataset.py
```

3. Run Tests:
```bash
make test
```


## Citation
> Anonymous Authors, "The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning", Under Review at COLM 2025.
